RAID and NASD

NASD comments from reading:

- bad explanation of architecture

- file manager not well explained

RAID comments from reading

- Failure assumptions may be limited (e.g. no correlated failures – what about batch failures?)

- Caching not considered

RAID background

Problem: technology trends

- computers getting larger, need more disk bandwidth

- disk bandwidth not riding moore’s law

- faster CPU enables more computation to support storage

- data intensive applications

Approaches:

- SLED: single large expensive disk

- RAID: redundant array of (independent, inexpensive) disks

NOTE:

- Disk arrays had been done before

- Contribution of this paper is a taxonomy and a way to compare them and organize them

Key ideas:

- striping: write blocks of a file to multiple disks, can read/write in parallel

- Redundancy: write extra data to extra disks for failure recovery. E.g. parity, ecc, duplicate data. Redundancy can improve performance – have choice of disk (latency), 2 disks (throughput)

Why arrays?

- Cheaper disks

- Lower power

- Smaller enclosures

- Higher reliability

o Can survive a disk failure

- Larger bandwidth

o Can read or write multiple disks at a time

How do you compare disk setups?

- Price?

- Power?

- Size?

- Performance?

o What performance?

o Large reads

o Small reads

o Large writes

o Small writes

o Read / modify / write (TP)

Organization:

- take N disks, put into groups of G

RAID versions:

JBOD: just a bunch of disks, mount as separate volumes

- Read / write performance for a file limited to single disk

- Reliability for a byte is same as single disk, but file system can tolerate some disk failures with partial data loss

RAID 0: striping

- Striping data across disks

- Best overall performance: G reads/sec, G writes/sec

- Worst reliability: MTTF = MTTF(disk) / G

RAID 1: mirroring

- store all data on two disks

- write to both disks

- read from whichever disk is faster (better positioned)

- Write performance = single disk

- Read performance = double

- Overhead is 100%

RAID 2: bit-wise ECC

- stripe data across disks in small units

- Store ECC bitwise on a parity disk

- All reads / writes hit all disks

- Can detect / correct lots of errors

- Bad performance

- FILL ME PERF

RAID 3: bit parity

- rely on disk for error detection

- Still read from all disks (but parity), write to all disks

RAID 4: block parity

- use single disk for error correction, rely on controllers for detection

- Can read from a single disk (no need to compute ecc)

- can write to two disks (data disk + update parity)

- Bottleneck: single parity disk for all writes

- Small writes require 4 accesses: read only block, old parity, write new block+ new parity

RAID 5: distributed parity

- same as level 4 but parity disk changes for each block

- Removes hotspot of parity disk

- Large writes efficient – just one extra access for parity

RAID 6: more error correction

- 2 parity disks allows detection 2 disk failures

Throughput per dollar

	small read	small write	large read	large write	storage efficiency	Reason
raid 0	1	1	1	1	1
raid 1	1	½	1	½	½	extra disk
raid 3	1/G	1/G	(G-1)/G	(G-1)/G	(G-1)/G	one disk doesn’t contribute
Raid 5	1	max(1/G, ¼)	1	(G-1)/G	(G-1)/G

Notes: Raid 2 inferior – like raid 3 but more ECC drives. Raid 4 inferior to Raid 5 – similar best case, but throughput limited by single parity disk

Choices of RAID

- QUESTION: what should you choose, when?

- Issues:

o Cost of disks – is it relevant? Perhaps space/power more relevant

o Workload: lots of small reads/writes indicates raid 1, lots of large reads and writes indicates 5

NASD

Technology trends:

- need distributed file system

- file server is bottleneck between client and data

- QUESTION: how do you scale up a file system?

o A: partition

§ Still limited by disk à server bandwidth

§ Partitioning usually limited to certain areas, e.g. volumes, mount points

Approaches:

- SAN: storage area networks

o attach disks to network

o Block level interface (read block, write block)

o Cooperating file systems to make it work

o Offers block-level management: backup, shadow, RAID

- NAS: network attached storage

o Richer interface to data: e.g. file systems, objects

o Inherits SAN benefits if implemented on SAN

- NASD: network attached disks

PROBLEM STATEMENT:

- bandwidth to clients limited by need for a centralized file manager

o QUESTION: Why?

o FS semantics, consistency, naming

- File system requires unnecessary copies

o Off disk to network

o network to memory

o Off memory to network

o network to client memory

ENABLING TECHNOLOGY:

- I/O bound applications: multimedia, databases, data mining

- New drive interfaces: they can be put on the net with iScsi

- Smarter drivers – more opportunities for programming them

- Storage networks / computer networks convering

- Storage servers (e.g. nfs, afs) not cost effective: server cost is dominant cost unless many disks attached

NASD Idea:

- separate metadata & management from data transfer

- Provide security mechanism to allow disk right onto network, without interposed control

- Principles:

o Data transferred directly from disk to client, no through server

o Asynchronous oversight: client can perform operations w/o synchronous access to manager. E.g. can read / write data without contacting manager. Policy info provided by manager as a capability, enforced by disk

o Object based interface: not blocks or files, but variable-length objects. File manager can use them as whole files or stripes. Provides more semantics for disk – more information available

- client talks to file manager to open files, creates directories, etc.

- File manager returns a capability that allows client to access disk directly

NASD interface:

- functions to access objects

- Secured with capabilities (like Kerberos tickets)

o Encrypted with disk key

o Contains private session key

o Client must prove it knows the session key with an authenticator

o May contain policy for disk to enforce

o Contains byte range for access (e.g. can limit to part of the object)

USING NASD

NFS:

- files == objects

- Lookup done on server, return capabilities

- Attributes map onto object attributes or uninterpreted by disk but interpreted by client NFS library

AFS:

- files == objects

- Clients parse directories, must ask file manager for a capability to a file

- Consistency model (invalidate callbacks on write) changes because writes not reported to manager; manager instead invalidates on open-for-write

- Quotas handled by granting access to more data than current size (update after close)

NASD PFS

- parallel file system by striping data across disks

- New storage layer, Cheops, implements striping (RAID 0) but same object interface

o Translates access for an object into many more capabilities that client can access

o Stripes data in 512kb chunks